Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(runtime): Add series implementation for event recorder #1655

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pando85
Copy link

@pando85 pando85 commented Dec 1, 2024

Motivation

The current implementation of the event recorder just creates new events which is not enough for my event handling expectations.

Solution

I added a recorder implementation more similar to the one in client-go library. This implementation caches events in local and groups isomorphic events to increment series count on similar events.

@pando85 pando85 force-pushed the feat/add-series-implementation-for-events-recorder branch 2 times, most recently from adeabe4 to e09e462 Compare December 3, 2024 23:58
Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, thanks for this! looks largely sensible to me, gave a quick go over with some quick questions and minor nits, lmk if my thinking makes sense.

kube-runtime/src/events.rs Outdated Show resolved Hide resolved
kube-runtime/src/events.rs Outdated Show resolved Hide resolved
kube-runtime/src/events.rs Show resolved Hide resolved
kube-runtime/src/events.rs Outdated Show resolved Hide resolved
kube-runtime/src/events.rs Outdated Show resolved Hide resolved
kube-runtime/src/events.rs Outdated Show resolved Hide resolved
Comment on lines 339 to 346
cache.retain(|_, v| {
if let Some(series) = v.series.as_ref() {
series.last_observed_time.0 + EVENT_FINISH_TIME > now
} else if let Some(event_time) = v.event_time.as_ref() {
event_time.0 + EVENT_FINISH_TIME > now
} else {
true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment suggestion for why there's a retain at the end of a publisher

Suggested change
cache.retain(|_, v| {
if let Some(series) = v.series.as_ref() {
series.last_observed_time.0 + EVENT_FINISH_TIME > now
} else if let Some(event_time) = v.event_time.as_ref() {
event_time.0 + EVENT_FINISH_TIME > now
} else {
true
}
// gc past events older than now + CACHE_TTL
cache.retain(|_, v| {
if let Some(series) = v.series.as_ref() {
series.last_observed_time.0 + CACHE_TTL > now
} else if let Some(event_time) = v.event_time.as_ref() {
event_time.0 + CACHE_TTL > now
} else {
true
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I put it there because I don't know where else to put it. We have to think of a better place for it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I realize that in the end doesn't make sense if you have to invalidate a previous event and nothing more was executed. I changed it at the beginning but I would like to run it in a better place and I don't know where.

Also, I added a test in d85f318 which I don't like so much but I didn't want to change the API and add a clock to the structure for mocking the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this is the event_recorder_cache_retain test? yeah, that feels a test that is too low (testing internals rather than interface). it would probably be more readable with a call to publish twice without all the internals.

you could perhaps make a mocking time/duration field conditional on #[cfg(test)] to avoid mucking with the interface. though it's private anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look when I have time. Thanks for the advice.

secondary: None,
},
&s.object_ref(&()),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are changing .publish, we should make the integration test also publish twice to the same name and verify we get an EventSeries

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I will try to add them soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on 07145ab

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually tested still. We should have an actual event series, with a count of 2 if we publish twice, and i tried this in controller-rs with this branch and am not seeing event aggregation from kubectl describe.

maybe i am doing something wrong though. have outlined my steps in kube-rs/controller-rs#116 and set it up against your branch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. i'm guessing this is because the event key contains an ObjectReference for equality, and objectref contains a resourceVersion meaning the ones i am changing are actually "not the same object", and would not be aggregated.

not sure how i feel about that. in my mind the aggregation ought to be unique per GVK, but maybe that's not the standard way of doing it in kubernetes land 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we are not using resourceVersion in Hash.

impl Hash for Reference {
    fn hash<H: Hasher>(&self, state: &mut H) {
        self.0.api_version.hash(state);
        self.0.kind.hash(state);
        self.0.name.hash(state);
        self.0.namespace.hash(state);
        self.0.uid.hash(state);
    }
}

I checked your code and saw that the recorder is initialized each reconcile cycle, so your cache is empty. I initialized it in the context creation.

On 07145ab we are testing to send it twice and check that series exists. We could add more checks or cases if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh. that explains the changing of the oref argument as well. changing this to create it on the context creates the correct results;

  Normal  HideRequested  56s (x2 over 75s)  doc-controller  Hiding `samuel`

kube-runtime/src/events.rs Outdated Show resolved Hide resolved
@clux clux added the changelog-change changelog change category for prs label Dec 4, 2024
@clux clux added this to the 0.98.0 milestone Dec 4, 2024
@pando85 pando85 force-pushed the feat/add-series-implementation-for-events-recorder branch 2 times, most recently from 79074f2 to e8e4b54 Compare December 5, 2024 16:32
Copy link

codecov bot commented Dec 6, 2024

Codecov Report

Attention: Patch coverage is 98.56115% with 2 lines in your changes missing coverage. Please review.

Project coverage is 75.9%. Comparing base (60d3ebc) to head (9e30323).

Files with missing lines Patch % Lines
kube-runtime/src/events.rs 98.6% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1655     +/-   ##
=======================================
+ Coverage   75.6%   75.9%   +0.3%     
=======================================
  Files         82      82             
  Lines       7405    7496     +91     
=======================================
+ Hits        5591    5682     +91     
  Misses      1814    1814             
Files with missing lines Coverage Δ
kube-runtime/src/events.rs 98.8% <98.6%> (+1.6%) ⬆️

@pando85 pando85 force-pushed the feat/add-series-implementation-for-events-recorder branch 2 times, most recently from 41756e8 to 70a69af Compare December 6, 2024 09:44
@pando85 pando85 force-pushed the feat/add-series-implementation-for-events-recorder branch from 70a69af to 38ed567 Compare December 6, 2024 11:16
@pando85 pando85 force-pushed the feat/add-series-implementation-for-events-recorder branch from 96240f4 to dd650b2 Compare December 6, 2024 12:06
@pando85
Copy link
Author

pando85 commented Dec 6, 2024

Now all tests were passed but I forgot about DCO again...

You can review it when you have time and if more improvements are needed I will work on them. If not I have to squash all those commits and we can merge.

clux added a commit to kube-rs/controller-rs that referenced this pull request Dec 9, 2024
Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the cleanups and sorry for the delay, have had a busy week. a few more small comments.

@@ -115,9 +158,10 @@ impl From<String> for Reporter {

impl From<&str> for Reporter {
fn from(es: &str) -> Self {
let instance = hostname::get().ok().and_then(|h| h.into_string().ok());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had missed this earlier; how does this behave in-cluster in kubernetes? would this take the node name? presumably you are doing this for ambiguation of events in the case of multiple replicas (a not super common setup for controllers afaik).

it seems sensible enough to do, but maybe it's better to not do this implicitly and let people use the downward api to pass in the pod name instead of needing another dependency to get a less useful node name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had the same concerns as you have. I decided to go with this because hostname crate is really small and I don't expect it to change, and also because client-go behaves in this way.

In Kubernetes, it takes the pod name as Hostname and this is "useful" because even with one replica if you recreate the pod you can differentiate them.

We can remove it from here, I don't have a strong opinion here. Furthermore, it is pretty easy to initiate it as you prefer in your controller.

kube-runtime/src/events.rs Show resolved Hide resolved
kube-runtime/src/events.rs Show resolved Hide resolved
secondary: None,
},
&s.object_ref(&()),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually tested still. We should have an actual event series, with a count of 2 if we publish twice, and i tried this in controller-rs with this branch and am not seeing event aggregation from kubectl describe.

maybe i am doing something wrong though. have outlined my steps in kube-rs/controller-rs#116 and set it up against your branch

kube-runtime/src/events.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-change changelog change category for prs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants